Transforming raw observations into structured R objects is the technical pipeline required for probabilistic analysis. Before modeling distributions, we must master data ingestion and structural nuances between lists, matrices, and data frames.
1. Structured Ingestion
Importing data via scan() often requires a dummy list structure to define variable types (e.g., list(id="", x=0)). This ensures external data from files like input.dat is parsed into manageable components rather than flat vectors.
2. Dimensional Organization
While a matrix is used for homogeneous numeric sets (utilizing byrow=TRUE), the data.frame() serves as the definitive bridge for statistical modeling, allowing heterogeneous data types to coexist.
3. Variable Accessibility
Accessing data for inference involves indexing via inp[[1]] or named columns like inp$id. Functions like attach() allow variables in the whole object (like eruptions) to be accessed directly without repeated indexing.